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Abstract 

Given a set of n points in the plane, each point having a positive weight, and an integer 
' I fc > 0, we present an optimal 0(nlogn)-time deterministic algorithm to compute a step 

fmiction with k steps that minimizes the maximum weighted vertical distance to the input 
fS| points. It matches the expected time bound of the best known randomized algorithm for 

^ ^ this problem. Our approach relies on Cole's improved parametric searching technique. 

D 

^ 1 Problem formulation and previous works 



q 



A function / : M — t- M is called a k-step function if there exists a real sequence ai < • • • < a^-i 
such that the restriction of / to each of the intervals (— oo,ai), [ai,aj+i) and [afc_i,+oo) is 
a constant. A weighted point in the plane is a triplet p = {x,y,w) G where {x,y) G 
represents the coordinates of p and w > is a weight associated with p. We use d(p, /) to 
denote the weighted vertical distance between p and /: that is, d(p, f) = w- \ f{x) — y\. For a 
^ set of weighted points P, we define the distance d(P, /) between P and a step function / as: 

> 

(N d(P,/)=max{d(p,/) IpGP}. 

vn 

This histogram construction problem is motivated by databases applications, where one 
wants to find a compact representation of the dataset that fits into main memory, so as to 
^) optimize query processing [5j. The unweighted version (that is, when Wi = 1 for all i) has 

^ been studied extensively, until optimal algorithms were found; see our previous article [Ij and 

references therein. 

The weighted case was first considered by Guha and Shim [5j, who gave an 0(n log n + 
^ /c^ log^ n)-time algorithm. Lopez and Mayster [9] gave an 0(n^)-time algorithm, which is thus 

^ faster for small values of k. Then Fournier and Vigneron [3] gave an O(nlog^n) algorithm, 

which was further improved to 0(min(nlog^ n, nlog n + A;^ log ^ log n log log n) ) by Chen and 
Wang [2]. Eventually, an optimal randomized 0(n log n)-time algorithm was obtained by Liu [8]. 
In this note, we present a deterministic counterpart to Liu's algorithm, which runs in 0(n log n) 
time. This time bound is optimal as the unweighted case already requires Q.{n\ogn) time [1]. 
Our approach combines ideas from previous work on this problem [5l [6] with the improved 
parametric searching technique by Cole [3] . 
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2 An optimal deterministic algorithm 



We consider an input set of weighted points P = {{xi,yi,Wi) \ 1 ^ i ^ n}, and an integer A: > 0. 
Let e* denote the optimal distance from P to a /c-step function, that is, 

e* = min{d(P, /) | / is a A;-step function}. 

Karras et al. [6] made the foUowing observation: 

Lemma 1 Given a set of n weighted points sorted with respect to their x-coordinate, an integer 
k > and a real e > 0, one can decide in time 0{n) if e < e* . 

The above lemma is obtained by a greedy method, going through the points from left to right 
and creating a new step whenever necessary. More than k steps are created along this process 
if and only ii e < e* . A consequence is that once e* is known, an optimal fc-step function can 
be built in linear time by running this algorithm on e = e*. 

A second observation, made by Guha and Shim [5J, is the following. The distance of a point 
p = {xi,yi,Wi) to the constant function c is equal to d{p,c) = Wi ■ \c — yi\. Hence, for a (non 
empty) subset Q C P of the input points, the distance min{d((5, /) | / is a constant function} 
between Q and the closest constant function is given by the minimum y-coordinate of the points 
in the region Uq defined as: 

Uq= Pi {{x,y)\y^Wi-\x-yi\}. 

{xi,yi,Wi)£Q 

In other words, the distance between Q and the closest 1-step function is the y-coordinate of the 
lowest vertex in the upper envelope Uq of the lines with equation y = zizWi{x — yi) corresponding 
to the points {xi,yi,Wi) £ Q. (There is only one lowest vertex as the slopes itwj are nonzero.) 

An immediate consequence is the following. For i G {1, . . . ,n}, let £21-1 be the line defined 
by the equation y = Wi{x—yi), and £21 the line defined hy y = —Wi{x—yi). Let L = {ii, . . . , £2-0} ■ 
We denote by A{L) the arrangement of these lines. (See Figure [!}) 

Lemma 2 The optimal distance e* from a set of weighted points P to a k-step function is the 
y -coordinate of a vertex of A{L). 




Figure 1: The arrangement A{L) and the upper envelope (shaded) of a subset Q C L (bold). The 
y-coordinate of the lowest point u in Uq gives the minimum distance from Q to a 1-step function. 

The deterministic algorithm presented here will be obtained by performing a search on the 
vertices of A{L), calling the decision procedure of Lemma [l] only O(logn) times, and with an 
overall extra time O(nlogn). We achieve it by applying Cole's improved parametric searching 
technique [3]: 
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Theorem 3 (Cole) Consider the problem of sorting an array A[l, . . . ,n] of size n. Assume 
the following two conditions hold: 

(i) There is an 0{n) time algorithm to test if A\i] ^ A[j]. 
(a) There exists a linear order ■< on the set {{i, j) \ 1 ^ i < j ^ n} such that 

{i,j)<ii',j') {A[^^A[j]^A[i']^A[j']) 
and such that we can decide if {i,j) ^ {i' if) in 0{1) time. 
Then, the array A can be sorted in O(nlogn) time. 

We briefly explain Cole's method. Recall that a sorting network [?J for n elements is a 
sequence Li, . . . , L^, each Li being a set of comparisons on disjoint inputs in {1, . . . , n}. On 
an input array A[l, . . . ,n], the network operates as follows: for each level p from 1 to d, the 
comparisons in Lp are performed in parallel, and the two elements of A corresponding to each 
comparison are swapped if they appear in the wrong order. If the sorting network is correct, 
the elements of A are output in sorted order after the last level. 

The algorithm from Theorem [3] works as follows. First build a sorting network of depth 
O(logn) in deterministic 0(n log n) time [HllO]. During the course of the sorting algorithm, 
each comparison in the network is marked with one of the following labels: resolved, active or 
inactive. In the beginning, the comparisons at the first level Li of the network are marked 
active, while all others are inactive. The weight 1/4^ is assigned to each active node at level p. 
Repeat the following until all nodes are resolved: 

- Compute the weighted median {im,jm) of all active comparisons with respect to the order 

< defined in (ii); 

- Decide if j4[zm] ^ ^[jm] with the algorithm from (i). This solves a weighted half of the active 

comparisons. Swap the corresponding element of A when in the wrong order, and mark 
these comparisons as resolved. Mark all inactive comparisons having their two inputs 
resolved as active. 

It can be proved that at most 0{n) nodes are active at any step. Since the weighted median can 
be computed in linear time, each step is performed in 0(n) time. Moreover, it can be showed 
that the algorithm terminates in at most O(logn) steps. So overall, this procedure sorts an 
array of size n in 0(n log n) time. 

We are now ready to give our algorithm for fitting a step function to a weighted point set: 

Theorem 4 Given a set P of n weighted points in the plane and an integer k > 0, a k-step 
function f minimizing d{P, f) can be computed in O(nlogn) deterministic time. 

Proof: Let : M {0, 1} be the mapping defined by 9{e) = if e < e* and d{e) = 1 
otherwise. First we sort the points of P with respect to their x-coordinate. Given e, this allows 
to compute 6{e) in time 0{n), using the decision algorithm of Lemma [!} 

We denote by vr : — )• M the projection onto y-coordinate axis. Recall the definition of 
the lines L = {ii, . . . , i2n}- For y G M, let ii{y) be the unique x G M such that (x, y) £ ii] it is 
well-defined since no line is parallel to the x-axis. By Lemma [2| it holds that 

e* = min{7r(t') | v vertex of A{L), 9{7r{v)) = 1}. 

Although e* is not known, we shall sort the set {£j(e*) | 1 ^ i ^ 2n} using Cole's method. 
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Note that L could be of cardinality smaller than 2n if some lines are identical. We discard 
identical lines and order them with respect to their order at — oo. That is, L = {£i, . . . , im} and 
for all < z < m, it holds that ii{y) < ^j+i(y) when y — ?■ — oo. Let us check that conditions (i) 
and (ii) of Theorem [s] hold. 

Condition (i): Given i < j, we want to decide if £i{e*) ^ £j{£*)- If lines ii and £j are parallel, 
the ordering on the lines ensures that ii{y) < £j{y) for all y and in particular for e*. Otherwise, 
we compute yo = T^i^i ^ £j) in time 0(1), then decide if yo < £* in time 0{n) by Lemma [ij If 
2/0 < £*, then £i{£*) > £j{e*); otherwise £i{e*) ^ £i{e*). 

Condition (ii): For i < j, let us define Tr{ii,£j) = 7r(£j n £j) if lines £i and ij intersect, and 
7r(£i,£j) = +00 otherwise. (Or equivalently 7r{£i,£j) = sup{y G M | £i{y) < £j{y)}.) Let ^ be 
the order on the set {{i, j) \ 1 ^ i < j ^ m} defined by: 

ihj) ^ («',/) if and only if TT{ii,£j) ^ TT{£i/,£j/). 

Assume ^ {i'jj') and £i{e*) ^ £j{e*). (See figure[2|) From the second condition, it holds 




Figure 2: A case where (i,j) ^ and £i{e*) ^ £j{£*)- 

that £* ^ Tt{£i,£j), then the first condition yields e* ^ TT{£i' ,£j/); it follows that ^^/(e*) ^ £j/(e*). 
Moreover, the order ^ can obviously be computed in 0(1) time. 

Hence Theorem |3] allows to compute a permutation a of {1, ... , m} such that 

4(l)(£*)^4{2)(e*)^--.^4{m)(£*) 

in time O(nlogn). By Lemma [2| it holds that e* G {7r(4(j) H 4(j+i)) | < i < m}. After 
sorting this set, we perform a binary search using the linear-time decision algorithm, and thus 
we compute e* in 0(n log n) time. At last, we run the decision algorithm on e* , which gives an 
optimal fc-step function for P. □ 

3 Concluding remarks 

When the input points are given in unsorted order, our algorithm is optimal by reduction from 
sorting [3]. So an intriguing question is whether there exists an o(nlogn)-time algorithm when 
the input points are given in sorted order. For instance, in the unweighted case, a linear-time 
algorithm exists if the points are sorted according to their x-coordinates [Ij. 

Our algorithm is mainly of theoretical interest, as Cole's parametric searching technique 
relies on a sorting network with O(logn) depth; all known constructions for such networks in- 
volve large constants. So it would be interesting to have a practical 0{n log n)-time deterministic 
algorithm. 
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